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Abstract 

A fast simulated annealing algorithm is developed for automatic object recognition. The object recognition 
problem is addressed as the problem of best describing a match between a hypothesized object and an 
image. The normalized correlation coefficient is used as a measure of the match. Templates are generated 
on-line during the search by transforming model images. Simulated annealing reduces the search time by 
orders of magnitude with respect to an exhaustive search. The algorithm is applied to the problem of 
how landmarks, for example, traffic signs, can be recognized by an autonomous vehicle or a navigating 
robot. Images are assumed to be taken while the robot or the vehicle is moving through its environment. 
It tries to match them with templates created online from models stored in a database. We illustrate 
the performance of our algorithm with real-world images of complicated scenes with traffic signs. False 
positive matches occur only for templates with very small information content. To avoid false positive 
matches, we propose a method to select model images for robust object recognition by measuring the 
information content of the model images. The algorithm works well in noisy images for model images with 
high information content. 
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1 Introduction 

The field of automated object recognition is one of the 
most complex areas in computer vision and image un¬ 
derstanding. Object recognition based on matched fil¬ 
tering has been a very active research area in computer 
vision for many years. Matched filtering has been used 
much earlier in the areas of radar, sonar, and signal pro¬ 
cessing [Opp78]. Valuable information for visual object 
recognition can be obtained from that literature. 

Although template matching has been widely used in 
computer vision [BB82, Yar85], a crucial problem with 
the method is the size of the search space [MR90, LC88]. 
There are several approaches published in the literature 
that either reduce the size of the search space or that 
direct the search towards areas in the search space for 
which a match is more likely [Gri88, Gri90, NR72, MR90, 
AF86]. In this paper a new approach is proposed that 
uses both such techniques. We discuss the problem of 
how certain landmarks, for example traffic signs, can be 
recognized by an autonomous vehicle or robot. For this 
particular application, a five-dimensional search-space is 
sufficiently large for robust object recognition and small 
enough for efficient object recognition. 

The method presented constructs templates on-line 
during the search. The algorithm uses an efficient local 
definition of the correlation coefficient to evaluate the 
match. The algorithm presented correctly finds the loca¬ 
tion, shape, size, and orientation of objects. If enough in¬ 
dependent information is contained in a template image, 
it can be matched with an object in an image uniquely. 
False positive matches occur only for objects that have 
very small information content. To avoid false matches, 
templates with insufficient information content should 
not be used for recognition tasks. We describe how to 
compute the information content of template images. 

Although the main objective of this paper is to de¬ 
scribe a new approach to the general problem of visual 
object recognition, the solution to the special problem 
of recognizing traffic signs is significant by itself. Au¬ 
tomatically recognizing traffic signs in images is very 
valuable for mobile robot or autonomous vehicle navi¬ 
gation. A robot that can recognize a traffic sign as a 
familiar landmark in its map of the environment can 
then use this information to localize itself in its envi¬ 
ronment [BG94, Bra90]. Our method stands apart from 
previous approaches to traffic sign recognition because 
first, it is efficiently applied to real-world landscape im¬ 
ages (as opposed to Ettinger’s isolated signs [Ett88]), 
and second, it does not rely on color perception which is 
very sensitive to lighting changes. This sensitivity limits 
the approach of May [May94] and Zheng et al. [ZRJ94] 
who address the problem of recognizing traffic signs us¬ 
ing color information. 

The optimization technique fast simulated annealing 
is applied to avoid the cost of brute-force search by di¬ 
recting the search successfully. It reduces the search 
time by orders of magnitude. Recent publications in 
the sonar literature [CBK+93, KCPD90] show that fast 
simulated annealing has been very successful in coherent 
signal extraction and localization in noisy environments. 
We use it in a similar way for incoherent image process¬ 


ing. Kirkpatrick et al. [KGV83] show how to implement 
a Metropolis algorithm [MRR+53] to simulate annealing 
of combinatorial optimization problems. Szu and Hart¬ 
ley [SH87] propose an inverse linear cooling schedule for 
simulated annealing. This version is called “fast simu¬ 
lated annealing.” The original slower version of simu¬ 
lated annealing has been applied to segmentation and 
noise reduction of degraded images by Geman and Ge- 
man [GG84], to represent lobed objects by Friedland and 
Rosenfeld [FR91], and to boundary detection by Geman 
et al. [GGGD90]. However, for visual object recognition, 
fast simulated annealing has yet not been exploited. 

This paper is organized in the following way: The ob¬ 
ject recognition problem is defined as a parameter search 
problem in Section 2. Section 3 shows how templates are 
generated from model images. Section 4 examines the 
search space of the recognition problem and introduces 
“ambiguity surfaces.” Section 5 describes our simulated 
annealing algorithm and Section 6 reports our experi¬ 
mental results. Section 7 analyzes the error in the cor¬ 
relation and proposes how to avoid false matches. Sec¬ 
tion 8 describes our results on noisy images. We con¬ 
clude with a summary of this work and suggestions how 
to apply these results to other problems. 

2 The Recognition Problem 

An object in an image I is defined to be recognized if 
it correlates highly with a template image T of the hy¬ 
pothesized object. This template image T is a trans¬ 
formed version of the model of the hypothesized object. 
Model images of objects are stored in a library. Section 3 
shows how to compute the template from the model. A 
template T(x,y), for 0 < x < tit, 0 < y < mr, is gen¬ 
erally much smaller than the image I(x,y). The tem¬ 
plate is compared with the part It(x, y) of image I(x,y) 
that contains the hypothesized object. Assuming pixel 
(xo,yo) is at the lower-left corner of the hypothesized 
object in I, subimage It is defined to be 

Mg y) = I(xo + x, yo + y ) for 0 < x < n T , 0 < y < m T . 

We use the normalized correlation coefficient as a mea¬ 
sure of how well images It and T correlate or match. For 
images It and T, the normalized correlation coefficient 
p is the covariance of It and T normalized by the stan¬ 
dard deviation of It and T. The correlation coefficient 
is dimensionless, and \p\ < 1. The correlation coefficient 
measures how accurate image It can be approximated 
by template T. Image It and template T are perfectly 
correlated if p = 1. We approximate p using the sampled 
coefficient of correlation 

r = ( pt E x , y Mg y)T(x, y ) - (E*, y Mg y)) ■ 

(E x ,y T (x,y)))/(T I T(TT 

where a Ir = ^p T E^ M*’M “ (E^ M*M) , 

= \jpT T,x,y T (g y ) 2 - (Es.y Eg y)) and p T is 

the number of pixels in the template image T with 
nonzero brightness values and pt < tit • nvr. Note this 



last condition means that not all the pixels in images T 
and It are actually compared but only the nonzero pix¬ 
els in T with the corresponding pixels in It- This is 
important, for example, if the template contains a cir¬ 
cular object. Here pixels in T bordering the circle (or 
the background) will be zero (black). The computa¬ 
tion time of r is proportional to the number of pixels in 
the hypothesized object, which is usually much smaller 
than the number of pixels in I. Using the correlation 
as a measure of successful recognition is also advanta¬ 
geous because it is a very robust measure. That is, it 
is relatively insensitive to fluctuations in the environ¬ 
ment compared to higher resolution methods, as is well 
documented in spectral, bearing, and range estimation 
problems [Joh82, BKM93]. 


3 Generating Templates from Model 
Images 

A template T(x,y) is generated from a model im¬ 
age M(x,y) by choosing three parameters that describe 
a transformation from M into T. The parameters deter¬ 
mine how the model is sampled, and if necessary, how it 
is interpolated to generate the template. The parame¬ 
ters used are a rotation parameter r and two sampling 
parameters s x and s y . 

For notational convenience, we define the origin of a 
coordinate system for model image M(x, y) to be in the 
middle of the image, i.e., M(x, y) is defined for -(»» — 
l)/2 < x < ( tim — l)/2 and — (rriM — l)/2 < y < (rriM — 
l)/2 for tim , ttim odd. Then the rotation parameter r 
determines how the x and y axes of M(x, y) are rotated 
to define the x and y axes of T(x,y). More precisely, 
given vectors 


' n M ~ 1 „ 

m x = I ---, 0 


and 


m y = ( 0 


m M ~ 1 


which lie on the coordinate axes of M, and model radius 
Rm = \J ( njV 2 ~ 1 ') 2 + (™ai^ 1) 2 ; we compute vectors 

t x = Rm (cos t, sin r) and t y = Rm (— sin r, cos r) 

which define the coordinate axes of the template image T 
in continuous space. The axes of T always span the 
model object as show in Figure 1. 

The sampling parameters s x and s y determine how 
many samples along vectors t x and t y are used for the 
template image, respectively. The spacing between the 
samples along t x is ((«» — l)/2 )/s x . If there is a pixel 
in M(x, y) after every [um ~ l)/(2sj;) step along t x , its 
brightness is used to define T along its a;-axis. For exam¬ 
ple this scenario may occur if r = 45 degrees, and s x = 
(«M — l)/2. As shown in Figure 1, if s x = (tim ~ l)/4 the 
model is down-sampled and transformed into a template 
that is about one-quarter the size of the model. Pixels 
of zero brightness are added where necessary as shown 
in Figure 1. 

In general, there may not be a pixel in M at the sam¬ 
pling point on vector t x . If this is the case, we use a 
four-point interpolation to define the brightness for the 
template at that point. Similarly, M is sampled (and if 


necessary interpolated) along vectors t y , —t x , and — t y 
to obtain the brightness of the template pixels along the 
template coordinate axes. The rest of the template is 
now determined from M along the grid that is defined 
by the samples on the template coordinate axes. 

Since the sampling rates s x and s y in the template 
coordinate system are different in general, the template 
is a rotated, scaled, and uniformly deformed version of 
the model. More parameters would be needed to de¬ 
scribe more general non-uniform and non-linear deforma¬ 
tions of the model. A straightforward extension would 
be to add a fourth parameter to obtain a non-uniform 
linear deformation of the model. However, for our pur¬ 
poses, the transformation described is sufficient because 
the objects to be recognized are usually flat, normal to 
the viewing direction and far away from the camera com¬ 
pared to the object size. Our method computes the tem¬ 
plate very quickly by sweeping over the model image only 
once. The time for creating a tit x rtiT template image 
is 0(nTmr). 

Examples of a model and corresponding transformed 
templates are shown in Figure 2. The first two templates 
are scaled by s x = s y and are not rotated. The remain¬ 
ing templates in Figure 2 are defined by more general 
transformations with s x ^ s y . 

4 The Parameter Search Space 

The space of possible solutions of the recognition prob¬ 
lem is extremely large, even if a particular object is 
known to be in the image a priori. The dimension of 
the search space is determined by the number of possi¬ 
bilities for position, size, shape, and orientation of the 
object. The number of possibilities for the position of 
the centroid of the object in the image is 0(n 2 ) for a 
n x n image. Assuming that the size and shape of the 
object can be approximated by sampling the model along 
two perpendicular axes as described in the previous sec¬ 
tion, the number of possibilities to approximate the size 
and shape of the object is also 0(n 2 ). Even with this 
assumption, the number of possible angles is still very 
large; since the image is discrete, we assume that the 
number of possible angles is O(n). Thus, the size of the 
search space is 0(n 5 ) for an n x n image. For a typi¬ 
cal image of size 256 x 256, the search space has a size 
of order 10 14 . An exhaustive search of this space would 
take too long to find a good match between templates 
and images. 

We use terminology from the radar and sonar liter¬ 
ature to describe the search space. We call the space 
an ambiguity surface. A peak in the ambiguity surface 
means that the correlation coefficient is high for a par¬ 
ticular set of parameters. Figure 3 shows an example of 
a two-dimensional ambiguity surface with a peak shown 
in black. There may be several peaks in an ambiguity 
surface. If the template and the object in the image 
match perfectly, the cross-correlation between template 
and image results in a peak in the ambiguity surface 
which is the global optimum. Due to noise and reduction 
of the search space by our template transformation, we 
do not expect a perfect match. However, in most cases 
the global optimum corresponds to a correct match or 




Figure 1: A 5 x 5 template image is obtained from a 9 x 9 model image using parameters s x = s y = 2 and t = 45 
degrees. 



Figure 2: Model of slow sign with 101 x 111 pixels, and six templates of slow sign. Templates are obtained by 
sampling model sign at various sampling rates and degrees of rotation. 



Figure 3: On the left, image Slow3. On the right, the ambiguity surface of image Slow3 computed for all possible 
translations given fixed angle and scaling parameters. A deterministic search would compute each value on this 
surface. A steepest descent procedure would fail because of local minima. Therefore, a stochastic search is used to 
find the best correlation value (here the darkest pixel value). 
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recognition. 

As we can also see in Figure 3, an iterative search for 
a peak in the ambiguity surface such as steepest descent 
would fail because it would get “stuck” in local minima. 
Simulated annealing, however, is able to “jump” out of 
local minima and find the globally best correlation value. 

5 The Simulated Annealing Algorithm 

In this section we describe our algorithm for finding an 
optimal match between images and templates. Our al¬ 
gorithm is based on a fast version of simulated anneal¬ 
ing. Simulated annealing has become a popular search 
technique for solving optimization problems. Its name 
originates from the process of slowly cooling molecules 
to form a perfect crystal. The cooling process and its 
analogous search algorithm is an iterative process, con¬ 
trolled by a decreasing temperature parameter. At each 
iteration, our algorithm generates templates on-line as 
described in Section 3. New test values for the loca¬ 
tion, sampling, and rotation parameters of the template 
are randomly perturbed from current values. If the cor¬ 
relation coefficient rj increases over the previous coeffi¬ 
cient rj-i, the new parameter values are accepted in the 
j -th iteration (as in the gradient method). Otherwise, 
they are accepted if 

g — Tj — -ffi-i )/Tj ^ ^ 

where ^ is randomly chosen to be in [0, 1], Tj is the tem¬ 
perature parameter, and Ej = 1 — rj is the cost function 
in the j- th iteration. For a sufficient temperature this 
allows “jumps” out of local minima. We choose 

Tj = To/j 1 < j < L 

as the cooling schedule for the j- th update of the temper¬ 
ature parameter where To is the initial temperature and 
L is the number of iterations during the search. Note 
that the rate at which the temperature decreases is in¬ 
verse linear as first proposed by Szu and Hartley [SH87] 
and converges faster than an often used logarithmically 
inverse cooling schedule [GG84]. As a criteria for stop¬ 
ping the annealing process, we simply put a limit on the 
search length L. Although this does not ensure conver¬ 
gence to the optimal correlation coefficient, the solutions 
we obtain for the parameters are generally sufficient and 
solve the recognition task. 

As Kuperman et al. [KCPD90] point out, if the search 
problem involves different kinds of parameters the an¬ 
nealing algorithm is rather analogous to the cooling of a 
mixture of liquids, each of which have different freezing 
points. An algorithm that randomly perturbs all param¬ 
eters at the same time has poor convergence properties. 
Therefore, at a specific temperature we do not combine 
the test for the choice of the location, sampling, and 
rotation angle. We also obtain good results using simu¬ 
lated annealing only for the location parameters, and a 
gradient descent procedure [CBK+93] for the remaining 
parameters given large enough perturbations. 

To properly deal with image boundaries of an image 
I(x,y) for which 0 < x < nj and 0 < y < mi, we use the 
following formula to perturb the ^-coordinate c x of the 


centroid position of a template with radius Rt in image 

I(x, y) 

c x if c x — Rt > 0 and c x + Rt < nj 

_ —c x if c x + Rt < 0 and c x — Rt > — nj 

x ~ 2rij — c x if c x — Rt > nj and c x + Rt < 2 nj 

ni/2 otherwise (unlikely perturbation). 

The ^-coordinate c y of the centroid of the template is 
perturbed similarly. This formula avoids attracting the 
centroid position to the rim or corners of the image. 

6 Experimental Results 

The algorithm described above was implemented on a 
Sun workstation and on a Silicon Graphics Iris. We used 
the model images shown in Figure 4 to find templates 
that correlate optimally with the scene images shown in 
Figure 5. The images are quantized using 256 grey levels. 
The size of the model images is 122 x 117 pixels (except 
for the one-way sign, which has 178 x 60 pixels.) The size 
of the scene images varies between 100 x 70 and 516 x 365 
pixels. 

For all scene images, the shape, size, orientation, and 
location of any traffic sign is found if it is known a priori 
what kind of sign to look for. For example, using the 
stop sign model shown in Figure 4 the algorithm finds 
the stop sign in a complicated scene image like image 
Stop5. (This is the second image in the last row of images 
in Figure 5; see also Figure 6). The stop sign in scene 
image Stop5 is recognized although the stop sign model 
was constructed from a picture of a completely different 
stop sign. Note that the stop sign in image Stop5 has 
graffiti, while the model sign does not. 

For the more general problem of recognizing which 
object is in a scene image (i.e., not knowing the kind 
of traffic sign a priori), we ran 144 experiments with 18 
scene images and 8 model images. Table 1 contains the 
correlation values obtained in the experiments. For each 
scene image, our algorithm computes the highest corre¬ 
lation coefficient among the set of values obtained for 
each model (boldface values in Table 1). The model cor¬ 
responding to the maximum correlation value is selected 
as the sign recognized in the scene image. For most scene 
images, the correlation coefficient is highest if a match 
between a sign in the image and its corresponding tem¬ 
plate occurs. Only for three images, Slow2, Stop4, and 
Stop5, a false positive match occurs because the best 
correlation coefficient is not the one for the correspond¬ 
ing model. We show the templates causing these false 
positive matches in Figure 6. 

There are two facts that contribute to the false pos¬ 
itive matches. First, some models do not have enough 
structure by themselves and match easily with arbitrary 
parts of the images. For example, the European no-entry 
sign’s white middle bar matches with the roof of a car in 
image Stop5, as shown in Image 5 of Figure 6. In Sec¬ 
tion 7 we analyze this problem quantitatively. Second, 
some models look quite different from the actual land¬ 
mark in the scene image. For example, as mentioned 
before, the stop sign model does not have any graffiti 
while the signs in Stop4 and Stop5 do. The templates 
constructed from the model stop sign do not match the 




Figure 4: Model images used in experiments: Footpath, E-no-entry, No-entry, One-way, Priority, Slow, Stop, and 
Yield. 


stop signs in images St.op4 and Stop5 well enough to 
result in a correlation coefficient larger than the one ob¬ 
tained with the model E-no-entry (see Image 4 and 5 of 
Figure 6). One could try to solve this problem by mak¬ 
ing a model of each traffic sign (including its graffiti) in 
the environment. However, this would result in a huge 
library of signs which would increase the search time sub¬ 
stantially. Moreover, the environment may change and 
out.da.t.e the library quickly. Therefore, we instead pro¬ 
pose to select a small number of model images with high 
information content (see Section 7) so that false positive 
matches are avoided. 


6.1 Illumination Changes 

The correlation coefficient p(It,T) measures not only 
how accurate image It can be approximated by template 
T, but also how accurate image It can be approximated 
by a linear function of T, since p(It , T) = p{It , o,T + h) 
for some constants a,b. Therefore, the correlation coeffi¬ 
cient is invariant to constant scale factors in brightness. 
Thus recognition is not affected by new lighting condi¬ 
tions that mainly result in such brightness changes. 


6.2 Simulated Annealing vs. Exhaustive 
Search 

We also implemented an exhaustive search of the en¬ 
tire parameter space to compare its running time to our 
fast simulated annealing algorithm. The comparison of 
our simulated annealing algorithm and exhaustive search 
drastically demonstrates the advantage of simulated an¬ 
nealing. We used image Noentry2 which has 112 x 77 
pixels. The search space had about. 6.8 x 10' sets of pa¬ 
rameters. It took 15 seconds to recognize the sign using 
our simulated annealing algorithm. In contrast, exhaus¬ 
tive search found the sign after more than 10 hours of 
computation time. 

Figure 7 illustrates how fast our simulated annealing 
algorithm recognizes a sign in a scene image. 
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Figure 7: A typical run of our simulated annealing algo¬ 
rithm. The sign is found after about. 300 iterations (ca. 
18 s). 


7 Avoiding False Matches 


The error in the sampled coefficient, of correlation r in¬ 
creases if the number of pixels pr in the image window 
considered decreases. For large samples of pt pixels the 
error of r can be expressed as the mean squared error 


(MSE) 


E[(r~p) 2 ] 


1 -P 2 
\/PT 


(see Figure 8 and Wea.t.herburn [Wea.62]). As Weather- 
burn points out., the sampling distribution of r is never 
even approximately normal. The probability curve is 
very skewed in the neighborhood of p = ±1, even for 
large samples. 

The normalized auto-correlation of model image M(x, y) 
is 


R{ T.r , Ty) 


E.r E y M ( x > y)M(x - T x , y - Ty) 

E.rE y (M(x,y)) 2 


The faster the auto-correlation falls off, the higher the 
resolution of the model image. Examples of auto¬ 
correlation images are shown in Figure 9. The resolu- 
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Figure 5: Scene images used in recognition experiments. The images are named by the sign in the scene and a number 
if the same sign is in more than one scene image. Reading left to right, the images are: Footpath, E-no-entry, No-entry 
1 & 2, One-way, Priority 1, 2, & 3, Slow 1, 2, 3, & 4, Stop 1, 2, 3, 4, & 5, and Yield 1 & 2. 
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Figure 6: False positive matches: Images 1 and 2 show templates constructed from models Slow and Yield overlying 
the sign in image Slow2 (correlation values 0.56 and 0.58, respectively.) Images 3 and 4 are cropped images of St,op4 
and Stop5 illustrating the best match with templates made from the Stop model. For images St,op4 and Stop5, we 
obtain better correlation values using models E-no-entry and Yield. Cropped versions of image Stop5 illustrating 
these false positive matches are shown in Images 5 and 6. 


TABLE 1 

Correlation Values for Recognition Task 


Images 

Footpath 

E-no-entry 

No-entry 

Models 

One-way 

Priority 

Slow 

Stop 

Yield 

Footpath 

0.77 

0.59 

0.38 

0.37 

0.46 

0.29 

0.35 

0.62 

E-no-entry 

0.49 

0.73 

0.39 

0.43 

0.46 

0.26 

0.38 

0.62 

No-entry 1 

0.22 

0.21 

0.67 

0.31 

0.24 

0.18 

0.17 

0.40 

No-entry 2 

0.29 

0.18 

0.84 

0.37 

0.14 

0.26 

0.23 

0.35 

One-way 

0.37 

0.55 

0.24 

0.70 

0.40 

0.38 

0.31 

0.58 

Priority 1 

0.36 

0.49 

0.34 

0.35 

0.58 

0.32 

0.30 

0.44 

Priority2 

0.46 

0.54 

0.40 

0.45 

0.66 

0.29 

0.32 

0.31 

Priority3 

0.37 

0.57 

0.40 

0.39 

0.62 

0.34 

0.37 

0.56 

Slowl 

0.25 

0.29 

0.25 

0.25 

0.45 

0.74 

0.15 

0.38 

Slow2 

0.38 

0.48 

0.39 

0.39 

0.32 

0.56 2nd 

0.21 

0.58 

Slow3 

0.39 

0.58 

0.41 

0.38 

0.40 

0.62 

0.30 

0.59 

Stopl 

0.41 

0.47 

0.42 

0.30 

0.22 

0.25 

0.69 

0.58 

St,op2 

0.23 

0.16 

0.27 

0.25 

0.18 

0.11 

0.38 

0.30 

Stop3 

0.26 

0.20 

0.33 

0.19 

0.13 

0.00 

0.34 

0.19 

St,op4 

0.42 

0.73 

0.46 

0.50 

0.43 

0.32 

0.56 3rd 

0.66 

Stop5 

0.43 

0.73 

0.44 

0.48 

0.29 

0.31 

0.51 3rd 

0.65 

Yieldl 

0.45 

0.75 

0.39 

0.50 

0.53 

0.32 

0.37 

0.78 

Yield2 

0.42 

0.73 

0.39 

0.50 

0.43 

0.32 

0.36 

0.82 
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Figure 8: Mean squared error of r for px = 100, 400 and 
2500. 

tion of a given model image can be measured with a sin¬ 
gle number, the coherence area A = ^2 r ^2 y (R(x,y)) 2 . 
Given the coherence area A and the number of pixels n 
of M(x,y), the number of coherence cells is c = n/A. 
The number of coherence cells is equivalent to the num¬ 
ber of degrees of freedom of the model image. It can 
be used as a measure of the information content of the 
model image. 

We examine the information content of each model 
image to evaluate how useful the model image is for the 
recognition task. All our model images M(x, y) have the 
same number of pixels n. Model images with low reso¬ 
lution (little structure) such as the European No-entry 
and Yield signs, do not have; Enough information con¬ 
tent for robust object recognition. This, and the mean 
squared error in r for small px , are responsible for the 
false positive matches reported in Table 1. In order to 
avoid false matches, we need to avoid using such model 
images with low information content. 

The models that contribute to the false positive 
matches, E-no-entry and Yield, have a coherence area 
of 313 and 197, respectively. This is much higher than 
the coherence area for models with more reliable match¬ 
ing results. For example, the Footpath and Stop signs’ 
auto-correlation falls off much faster; their coherence ar¬ 
eas are 148 and 56, respectively. The number of coher¬ 
ence cells in E-no-entry is 297 and in Yield 473, but in 
Footpath it is 628 and in Stop, even 1641. 

Thus, the number of coherence cells is a quantitative 
measure for determining if a model has enough infor¬ 
mation content to be useful as a template. Most of the 
models we use have a large enough number of coherence 
cells for robust detection, but subsequent downsampling 
in generation of templates may corrupt this. 

8 Results on Noisy Images 

Gaussian noise is added to the brightness values of some 
of the scene images to examine the robustness of our 
algorithm. The algorithm is able to find the sign even 
in strongly degraded pictures. The signal-to-noise ratio 
(SNR) of a noisy image is defined as 10 log of the variance 
of the noisy image over the variance of the noise. 


Several noisy images are obtained by corrupting image 
Slow3 by zero-mean Gaussian noise with various signal- 
to-noise ratios. Our results for image Slow3 are summa¬ 
rized in Figure 10. Note that the correlation increases 
as the signal-to-noise ratio increases. 



SNR of noisy Slow3 in dB 

Figure 10: Correlation coefficient for sign recognition in 
noisy versions of image Slow3. 

Figure'll shows images Slow3 and Slow4 corrupted 
by Gaussian noise with zero mean and SNR 3 dB and 
5 dB, respectively. Matches for pictures with much lower 
SNR are possible for templates with much larger number 
of pixels and information content than those presented. 
(In radar and sonar, signals with negative SNR are com¬ 
monly extracted given sufficient information content.) 

9 Conclusions 

Our method has been shown to efficiently recognize ob¬ 
jects in complicated landscapes in the presence of noise. 
To our knowledge, our work is the first to apply fast sim¬ 
ulated annealing to object recognition. Our results show 
that it makes the parameter search of object recognition 
feasible. 

We strongly advocate the use of template matching 
in recognition tasks and provide quantitative techniques 
to analyze its limits. We show how to measure the in¬ 
formation content of templates as a way to make the 
recognition algorithm robust. 

For the application of traffic signs, we have shown 
that the search space can be successfully reduced by us¬ 
ing a three parameter transformation from model image 
to template. This method is well suited for recognition 
tasks that involve objects with scale and shape varia¬ 
tions. The method is so efficient that templates can be 
constructed on-line during the search. 

For future work, severe illumination variations within 
the object and occlusion problems can be addressed. 
Other applications of our method, for example in medical 
computer vision and in face recognition, are being inves¬ 
tigated. A recent paper by Brunelli and Poggio [BP93] 
reports successful face recognition using template match¬ 
ing. The authors normalize their test images by fixing 
the direction of the eye-to-eye axis and the interocular 
distance. The location of the masks for eye, nose, mouth, 
and fact’ templates are also fixed. We believe that we can 










Figure 9: Auto-correlation of model images Footpath, Stop, E-no-entry, and Yield. To illustrate how fast the 
auto-correlation falls off, the e-folding lengths, i.e., pixels (x,y) with R(x,y) x> 1/e, are shown on a dark contour. 



Figure 11: The first, and third images are images Slow3 and Slow4 degraded by Gaussian noise with zero mean and 
SNR 3 dB and 5 dB, respectively. The second and fourth images illustrate that the object is recognized where the 
templates computed are shown overlying the recognized sign in the scene. (These images are shown brighter so that 
the overlying template can be illustrated better.) 


generalize Brunelli and Poggio’s application to recognize 
faces in images that are not normalized but contain more 
general scenes with varied backgrounds. 
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